OBSTRUCTIONS TO UNIFORMITY, AND ARITHMETIC 
PATTERNS IN THE PRIMES 



TERENCE TAO 

ly-^ | Abstract. In this expository article, we describe the recent approach, motivated by 

ergodic theory, towards detecting arithmetic patterns in the primes, and in particular 
establishing in (2H] that the primes contain arbitrarily long arithmetic progressions. 
One of the driving philosophies is to identify precisely what the obstructions could be 
that prevent the primes (or any other set) from behaving "randomly" , and then either 
show that the obstructions do not actually occur, or else convert the obstructions into 
' usable structural information on the primes. 

on : 

pi 1. Introduction 

£h ' An important class of problems in additive number theory, many of which are still 

far from being solved, concerns the existence and distribution of affine-linear arithmetic 
patterns in the primes and almost primes. Some well-known examples of these problems 
include: 

• (Twin prime conjecture) Does there exist infinitely many numbers n such that 
i ' n, n + 2 are both prime? 

^ ■ • (Chen's theorem) [5| There exists infinitely many numbers n such that n is 

O prime, and n + 2 is the product of at most two primes. 

• (Sophie Germain prime conjecture) Does there exist infinitely many numbers n 
q ' such that n, 2n + 1 are both prime? 

• (Goldbach conjecture) For every sufficiently large even number N, does there 
exist an n such that n and N — n are both prime? 

• (Vinogradov's theorem) jHIj For every sufficiently large odd number N, there 
exists n, m such that n, m, and N — n — m are all prime. 

• (Hardy-Littlewood prime tuples conjecture) [3T] For any integers ax,---,a k , 
>- ! which do not fill out all the residue classes of Z/pZ for any prime p, there 

exists infinitely many n such that n + ax, . . . , n + are all prime. 

• (van der Corput's theorem) jlHj There exist infinitely many positive numbers 
a, r such that a, a + r,a + 2r are all prime. 

• (Green- Tao theorem) |2Z| For any k, there exist infinitely many positive integers 
a, r such that a, a + r, . . . , a + (k — l)r are all prime. 

A unifying conjecture that encompasses all of these results is the generalized Hardy 
Littlewood prime tuples conjecture, which we now discuss. As is customary in additive 
number theory, the most convenient way to count patterns in the primes is to introduce 
the von Mangoldt function A : Z — > IR + , defined by setting A(n) := logp whenever 
n = pi is a power of a prime p for some j ^ 1, and A(n) = otherwise (in particular 
A vanishes on zero and the negative integers). This function is mostly supported on 
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the primes, and obeys a number of useful properties; for instance, one can encode the 
unique factorization of the integers via the pleasant identity 1 

\ogn = Y,Md) (1-1) 

d\n 

for all n G Z + . Also, the prime number theorem can be phrased succinctly as 

E(A(n)|l<n<JV) = 1 + 0^00(1) (1.2) 

where we use E(/(n)|ra G A) to denote the average SngA /( n )> an d 0at-kx>(1) denotes 
a quantity that goes to zero 2 as N — > 00. Thus A is essentially normalized to have mean 
1. More generally, for any modulus q ^ 1 and any integer a, we have 

E(A(n)|l ^n^ N;n = a(modq)) = A z/ , z (a) + o^.^l) (1.3) 

for all sufficiently large N, where ojv->oo;g(l) is a quantity which goes to zero as N — ► 00 
for any fixed g, and the "local von Mangoldt function" Az/ g z(a) is defined as the function 
which equals when a is coprime to q and otherwise, with (p(q) = |(Z/gZ) x | being 
the Euler totient function; this result follows by combining the prime number theorem 
(jl.2J) with Dirichlet's theorem on the distribution of primes in arithmetic progressions. 
One can also think of (jl.3j) as an assertion that the Ai/ q z is essentially the conditional 
expectation of A to the a-algebra generated by the residue classes modulo q. 
From the sieve of Eratosthenes, one is led to the heuristic 3 

A(n) « l n>0 Yl A z/ P z( n ) 

p<R 

where 1 < <C n is an intermediate quantity between 1 and n that we shall be 
deliberately vague about specifying 4 . The Chinese remainder theorem then suggests 
that the local factors A^/ P z(n) in this product should behave "independently". This 
leads to the following conjecture: 

Conjecture 1.1 (Generalized Hardy-Littlewood prime tuples conjecture). Let m,t be 
positive integers. For each 1 ^ % ^ m, let ipi : Z* — > Z be an affine-linear form 
ipi(xi, . . . , x t ) = Y^j=i LijXj + h f or some integers Lij, b it such that the forms ipi are all 
non-constant, and no two are rational multiples of each other. Let N be a large integer, 
and assume that bi = O(N) for all 1 ^ i ^ m. Then we have 

m 

E(JJ Atyi(x))\x G {1, . . . , N} 1 ) = aoo (N) J] « P + N ^oo;m,t,L(l) (1-4) 
i=l p 



All sums shall be over the positive integers Z + unless otherwise indicated. 

2 Of course, one can make the decay rates much more quantitative, especially if one assumes strong 
hypotheses such as the Riemann hypothesis. However, our discussion here will be not require any 
quantitative control of o(l) type error terms. 

3 If P is a statement, we use lp to denote the quantity 1 if P is true and if P is false. Similarly if 
A is a set, we write 1a(ti) for l n eA- 

4 The original sieve of Eratosthenes requires R — y/n, but this is problematic for a number of reasons, 
for instance Mertens' theorem shows that a further correction term is required. In practice we shall 
think of R as being somewhat smaller, for instance a small power of n. 
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where L := (L^) is^ m ,i<^t, a OQ (A/') zs i/ie /oca/ density at infinity 

m 

aoo (N) := E([[ l Mx)>0 \x e {1, . . . , iV}*) 
i=i 

and a p is iae local density at each prime p 

rn 

a p := E(Y[A z/pZ (^(x))\x G (Z/pZ)*). (1.5) 

i=l 

Remark 1.2. Tne density aoo(N) simply reflects the fact that the primes are positive; 
this factor is just 1 if all the Lij and hi are positive. Note we allow the hi to depend 
on N , and the error term ojv->oo;m,t,L(l) is presumed to be independent of the hi; this 
is necessary in order for this conjecture to encompass such conjectures as Goldbach's 
conjecture. One can show that a p = 1 + O mit> £,(l/p 2 ) and hence the product Y\ p Oi p (also 
known as the singular series) is always convergent. The conjecture is an assertion that 
the von Mangoldt function A(n) behaves "randomly" , subject to the structural constraints 
that it must resemble l n >o "locally at infinity" (e.g. in the sense of (jl.2|) ). and must 
resemble A%/ p % locally at each prime p (e.g. in the sense of (jl.3j) ). One can also extend 
the conjecture to polynomial fy; this is known as the Bateman-Horn conjecture jl]. 

This conjecture, if true, would imply all the conjectures and theorems stated earlier. 
For instance, it predicts 

E(A(n)A(n + 2)|l < n <: N) = JJ a p + o^^l) (1.6) 

p 

where «2 := 2 and a p := 1 — (ZZfp for all odd primes p. The twin prime constant 

n 2 := Yl a P = 0.66016. . . > 

p odd 

is positive, and (jl.fij) can then easily be seen to imply the twin prime conjecture. Simi- 
larly for the other conjectures and theorems stated earlier. 

Of course, this conjecture is still hopelessly out of reach in the general case. However, 
several partial results are known. The bounds (jl.2|) . (|1.3J) can already handle the m = 1 
case of this conjecture and more generally they can handle any "non-degenerate" case 
with m ^ t. The Hardy-Littlewood circle method, which we discuss below, is roughly 
speaking able to handle any non-degenerate case with 3 ^ m ^ t+1 (thus encompassing 
Vinogradov's theorem and van der Corput's theorem), as well as a few additional cases 5 , 
but does not seem able to handle the general case. The conjecture is also known to be 
true if one averages over a suitable subset of the parameters L^, bf, see 0. In the 
general case, the technique of upper bound sieves in sieve theory can usually yield an 
upper bound of C mi t« 00 (A r ) Yl p a p + °N^oo;m,t,L(^) for (jl.4|) for some explicit C m>t (which 
usually has to be at least 2, thanks to the notorious parity problem)] see also Section |2] 
below. Closely related to this are the results of Goldston and Yildirim, which show that 
asymptotic formulae such as (jl.4j) can be recovered (but again with a loss of C m)t on 
the right-hand side) if one replaces A with a slightly larger function v which is localized 

5 For instance, by a clever iteration of the circle method, it was established in [2] that for any k there 
exist infinitely many fc-tuples of distinct primes pi, .. - ,Pk, such that all the midpoints (pi +pf)/2 are 
also prime. 
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to almost primes (numbers with no small divisors) rather than primes themselves. The 
ergodic theory-style transference arguments used in [26J, [27] can conversely give lower 
bounds of c^a^N) Y\ p a p + o iV _ +00;mitiL (l) for some small < c m>t < 1, but only for 
linear forms which are homogeneous (no constant term 6j) and which are translation 
invariant, in the sense that they take the form 

if>i(xi, ...,x t )=x 1 + tjji(x 2 , ...,x t ). 

In this special case, which covers the case of arithmetic progressions in the primes, there 
is also some hope of recovering the full asymptotic (jl.4j) : we discuss this below. 

In this expository article we shall discuss these techniques, starting with the prime 
number theorem (but re- interpreted in the perspective of Goldston-Yildirim majorants), 
the classical circle method (but re-interpreted in a more "ergodic" perspective), and then 
turning to long arithmetic progressions in the primes; we also discuss some further recent 
progress in the case of progressions of length four. In particular we hope to communicate 
some of the main philosophical ideas underlying the approach in [26], namely: 

• Viewing the primes as a dense subset, not of the integers, but instead of a "pseu- 
dorandom" set of almost primes (or more precisely, a pseudorandom major ant 
v for the von Mangoldt function A); 

• Attacking problems such as (jl.4j) by locating the "obstructions to uniformity" 
which could potentially prevent (jl.4j) from being true; 

• Using tools such as conditional expectation to handle these obstructions to uni- 
formity, or tools such as the circle method to show that they do not occur at 
all. 

This is by no means intended to be an exhaustive survey; see for instance [SB] for a 
more in-depth discussion of many of these issues. We will also not give detailed proofs 
for most of the assertions in this survey, referring the reader instead to the original 
papers. 

2. The prime number theorem and enveloping sieves 

We begin with the classical prime number theorem (jl.2|) . The story of this theorem, 
and its connection to the zeroes of the Riemann zeta function £(s) := Yl n ls °f 
course very well known, but we revisit it to make two points. Firstly, as was observed 
by Chebyshev, one can obtain upper and lower bounds for (jl.2j) by elementary means 
(utilizing the pole of £ at s = 1, but requiring no further knowledge about zeroes 
or analytic continuation) that are only off by an absolute constant. Secondly, by a 
refinement of this elementary method one can in fact get asymptotics with o(l) error 
terms, but at the cost of smoothing out the von Mangoldt function A and replacing it by 
a slightly larger variant, namely an enveloping sieve v for A. In fact, it turns out even 
such results as those in j2H|, establishing arbitrarily long arithmetic progressions in the 
primes, can in fact be proven without knowledge of the full prime number theorem (and 
thus without knowing any non-trivial zero- free region for (, or for any other L-f unction), 
instead using only 6 these elementary techniques, albeit in conjunction with a deep and 
powerful theorem of Szemeredi. 

6 Of course, the larger the zero-free region is known for the zeta function, the better the bounds one 
will obtain on the number of progressions, but if one just wants to obtain the qualitative result that 
there are infinitely many progressions, no zero-free region beyond the trivial one used here is required. 
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We begin with the argument of Chebyshev (rephrased here in modern language). If 
s is any complex number with 9ft(s) > 1, we may multiply (jl.lj) by and sum in n, 
and make the change of variables n = dm, to obtain 



logn x - A(d) sr-^ J_ = A(d) 



n d m d 

The right-hand side is — and hence we have the standard formula 

From summation by parts we obtain the bounds 

COO = + 0(1); C\s) = -^— + 0{l) (2.2) 
s — 1 (s — l) z 

when 3?(s) > 1 and s is close to 1. In particular, we have a very small zero free region 
for ( near s = 1. We conclude that 

Ef^ + Od) (,3) 

whenever 9?(s) > 1 and s is close to 1. This, combined with the trivial observation that 
A is non-negative, is already enough to give the elementary bounds 

c - o^oo(l) < E(A(n)|l < n < N) ^ C + 0^(1) (2.4) 

for some absolute constants < c < 1 < C; for instance the upper bound follows by 
setting s := 1 + in ([2.3)1 . while the lower bound follows by setting s : = 1 + -^j^ for 
some large C and using the upper bound already obtained to eliminate error terms. 

The estimate (j2.4j) is not an asymptotic, of course, since c ^ C. However, we can 
recover good asymptotics by smoothing out the von Mangoldt function A slightly. We 
introduce the Mobius function \i : Z + — ► { — 1,0, +1}, defined by fi(n) = (— l) k when 
n is the product of k distinct primes for some k ^ 0, and fi(n) = otherwise. The 
significance of this function lies in the inclusion-exclusion formula 



l„>o J>(<0> (2.5) 



L n=l 

d\n 



and hence from (jl.lj) 



A(n) = l n >o y~,A(m)l n / m= i 

m\n 

= l n>0 ^2 A(m)^(d) 



dm|n 

J^MO log- 

logn Y]fl(d)(l 



(2.6) 



logd N 

^ logn 

d\n 
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Inspired by this, let us define the truncated von Mangoldt functions : Z — > R by 

AnAn) :=\ogRj2^dM^) (2.7) 

d\n 

where R > 1 is a large parameter, and ip : R — > R is a function supported on the interval 
[—1,1]. For instance, the von Mangoldt function itself corresponds to the case when 
R = n and <p(x) := max(l — \x\, 0). The case when R < n and <p(x) = max(l — \x\, 0) 
was studied by Goldston and Yildirim; that case is also related to the Selberg upper 
bound sieve 7 , see |2Ij for further discussion. These functions are more "localized", and 
hence easier to analyze, than the original von Mangoldt function, in the sense that they 
only involve divisors d that are less than R 8 . 

The truncated von Mangoldt functions behave somewhat similarly to the von Man- 
goldt function, but are concentrated on the almost primes rather than the primes them- 
selves. For instance, it is easy to see that A# )¥ ,(n) = y?(0) logR whenever n is a prime 
larger than R, or more generally if n is the product of primes larger than R. One can 
also easily establish a fairly elementary "prime number theorem" for these functions, 
provided that R is not quite as large as N: 

Proposition 2.1 (Prime number theorem for A^). If N £ ^ R ^ N 1 " 6 for some e > 0, 
and ip is smooth with <p(0) = 1 and <p'(Q) = 0, then we have 

E(A^(n)|l < n ^ N) = 1 + 0^.^(1). (2.8) 
Proof. We can expand the left-hand side of ()2.8j) as 



d^R 

From the elementary estimate 



]o g R J2^ d )f (r^4) ^vi 1 < n < N )- 

A^U V Og / 



E(l d | n |l^n<AT) = i + 0(i) 



we can thus write the left-hand side of ()2.8|) as 

d^R \ b / d ^ R 

Here the subscripting of OQ by <p denotes that the implied constant is allowed to depend 
on ip. Since ip is supported on [—1, 1], we may remove the restriction d ^ R. Since we 
are taking R ^ A^ 1_e , the error term here is ojv-^oo;e,¥j(1)- Since we also take R > N £ , it 
thus suffices to show that 

^WMW + ^-Wl)- (2-9) 



d \\ogR 



7 The choice f{x) — max(l — |a;|, 0) will give an optimized value of the relative density between 
A and its enveloping sieve, although we will not need such optimization in our arguments. Very 
recently, however, there has been work of Goldston, Motohashi, Pintz, and Yildirim, which use precise 
optimization of higher-dimensional enveloping sieves in order to establish small gaps between primes, 
thus exploiting enveloping sieves in a rather different way than that discussed here. 

8 This can be viewed as a manifestation of the uncertainty principle: localizing a function in the 
spectral or "frequency" sense (i.e. with respect to the divisors d) must necessarily cause derealization 
in physical space (i.e with respect to the variables n). 



OBSTRUCTIONS TO UNIFORMITY AND PRIMES 7 

To proceed further we need to split y(j^g) into expressions which are multiplicative 
in d. This is easiest to establish by Fourier expansion 9 . Since the function e x ip(x) is 
smooth and compactly supported, we have 

/oo 
i){t)e~ ixt dt (2.10) 
-oo 

for some rapidly decreasing function 10 ip. We truncate this at \t\ = log 1 / 2 R (for instance) 
to obtain 

e x <p(x)= [ ij(t)e~ ixt dt + A , v (\og- A R) 

for any A > 0. In particular, we have 

hgd f j>(t) dt / 1—1/ log R i —A foi-n 

and hence the left-hand side of ()2.9j) can be written as 

/ 1/2 E dt + o A ,^o g Rj:- d d- 1/losR ^- A R). 

J It^log 1 / 2 R d d 

By taking A = 3 (say), we see that the error term is Oij_» ooiv (l) and so can be discarded. 
As for the main term, we first repeat the derivation of (|2.1jl . using (|2.5|) instead of 
to conclude 

/i(d) 1 

by ()2.2|) we thus have 

£^ = ,-i + o(|.-i|>) 

when 5R(s) > 1 and s is sufficiently close to 1. Setting s = l + for some |*| < log 172 /? 
we obtain (for N and hence R sufficiently large) 

Inserting this bound into the previous computations, and using the rapid decay of ip, 
we can thus write the left-hand side of (|2.9jl as 

1 + it)ip(t) dt + o R _> 

lilsClog 1 / 2 /? 

Using the rapid decay of ip again, we can write this as 

(l+it)ip(t) dt + OR^.^l) 



OO 



which we rewrite in turn as 

d r °° 



(1 - ^) / e-">(t) dt\ x=0 + on^.Jl). 



9 One could also use contour integration methods here instead of Fourier methods; the two approaches 
are essentially equivalent. 



10 In other words, ip(x) = Oa,v>((1 + M)~ A ) for all A > and x G 
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Applying (|2.10j) . this becomes 

<p(0)-(p'(0) + 0^00^(1), 

and the claim follows from the hypotheses on ip. □ 
One notable drawback of the truncated von Mangoldt functions A RtV is that, unlike 
A, it is perfectly possible for A R ^{n) to be negative. This however can be rectified by 
replacing A R>V with the variant 

" = "^ := isb A ^ (2 - 12) 

This function is still large on almost primes, indeed u(n) = A Rjlfi (n) = ip(0) 2 \ogR 
whenever n is a prime greater than R, or a product of primes greater than R. In 
particular, if logi? ~ log N and <^(0) ~ 1 then we have the pointwise bound 

^ A(n) ^ Cu{n) (2.13) 

for all 1 ^ n ^ N, where C := ~^r- As observed 11 by Goldston and Yildinm, we 
can also modify the above argument to obtain a prime number theorem for u, although 
at the cost of reducing the size of R: 

Proposition 2.2 (Prime number theorem for v). If N £ ^ R ^ N l l 2 ~ £ for some e > 0, 
and if is smooth with \(p'(x)\ 2 dx = 1, then we have 

E(v(n)\l ^ n ^ N) = 1 + on^^I). (2.14) 

Proof. We repeat the proof of Proposition 12.11 We can expand the left-hand side of 
flZH as 

From the Chinese remainder theorem we have 

where [d, d'\ is the least common multiple of d and d' . The hypothesis R ^ N 1 / 2 ^ 6 
allows us to discard the error term as before, leaving us with the task of establishing 

^fi{d)fi{d') logrf Aogd' 

From (j2.11|) we have 

log d , log f dtdt' wA-i/io g ii loo .-A m 

^hg-R^hgl^ ~ J m , Klog v> R P^W^ +°AA(dd) log R). 

Let us first dispose of the error term. This contribution can be bounded by 



11 Strictly speaking, these authors only consider the case <p{x) = max(l — |x|,0), but the argument 
extends to general if without difficulty. 
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Using unique factorization Z + = Y\ P P Z+ ■> anc ^ the multiplicative nature of the summand, 
the sum can be expanded as an Euler product 



' [d,d'}(dd')-y^R n ^ \d,d<\{dd>y x i^ R ' 

P d,d'£p z+ 



One can compute 

V - = 1 + 0(llv 1+1/XogR ) € (I - llv 1+1/ 

^ [d,d'}(dd')-y^ R i + >p 

On the other hand, from (|2.2J) and the Euler product 

cM^nE^nd-VPT 1 



n B 



we have 

H(l-l/^ = -L l + 0(l) (2.15) 
p 

for dt(s) > 1 and s close to 1. From this we see that the total contribution of the error 
term is OA,ip{\og°^~ A R), which is acceptable since A can be chosen to be large. 
It remains to control the main term, which is 

l°gR [ V n ;1 m^f^} d 'rln^n 5 I MMtf) dtdt' . (2.16) 

The expression inside the parentheses can be expanded as an Euler product 

ny^ n{d)fi{d') 
^ [d, d'\dP-+*y lo § R (d') (!+«')/ log R 

P d,d'£p l + 

which one can compute as 

1 1 Is 



pl+(l+it)/lagR pl+(l+itf)/lagR pl+(2+it+it')/lagR'' 



P 

After some Taylor expansion, we can write this as 



n ^ 1 pi+(i+^)/iogfl)( 1 pi+d+iO/ioga) ^ (1 + |t| + l^fjbgp 

1- 1+(2+ * 71 5 1+1 pHogR )l 10 

p pi+(2+it+it')/ log R r o 

Since ^fj 2 is convergent, and ^ log 1 / 2 i? = or^qo (log .R) , we have 



nv, , ^ (i + |t| + |t'|)iogp ^ n , 

11(1 + 0( )) = 1 + 0^(1) 



V 

Applying ()2.15|) . we can thus write (|2.17|) as 



i (1 +it)(l + if) 

;i + 0^(1)) log- 1 



2 + it + it' 



10 



TERENCE TAO 



The contribution of the error term to ()2.16|) is or^oo, </>(!)) thanks to the rapid decrease 
of ip. Hence we are left with the expression 

(l+it)(l+it') , . i / t\ r , / 

'\t\,\t'\^ogV 2 R 2 + it + it' 

and by using the rapid decay of ip again, we see that we will be done as soon as we 
establish the identity 



OO POO 



Since 

1 



2 + it + it' 
the left-hand side can be written as 



oo^-oo 2 + it + it 



e -(2+it+it')x dx = e -(l+it)x e -(l+it')x ^ x 

(I JO 



OO POO 



ij(t)(l + it)e- {1+it)x dxf dt. 

3 

But by dividing (j2.10j) by e x and then differentiating in x, we obtain 

POO 

cp'(x) = - ip(t)(l + it)e- ixt dt 



J —oo 

and the claim follows. □ 
It turns out that the above elementary argument is quite flexible, and can also give 
more sophisticated estimates for z/, similar to (jl.4|) . Indeed we have 

Theorem 2.3 (Generalized Hardy-Littlewood prime tuples conjecture for v). Let m,t 

be positive integers. For each 1 ^ i ^ m, let tpi : Z* — > Z be an affine-linear form 
ipi(xi, . . . , x t ) = Y^j=i LijXj + h for some integers Lij, bi, such that the forms ipi are all 
non-constant, and no two are rational multiples of each other. Let N be a large integer, 
and assume that bi = O(N) for all 1 ^ i ^ m. If N £ ^ R ^ N l l 2m ~ £ for some e > 0, 
and (f is smooth with \(f'(x)\ 2 dx = 1, then we have 

m 

E(l[i^M.r))\.r e {1 V}') = n 

(1) (2.18) 

i=i p 
where a p was defined in (1 1.5 J) . 

We will not prove this result here, but remark that the proof is a routine extension 
of that used to prove Proposition I2.2[ and very similar results were proven in [IT]. 
[TH] . [T5] . [2*0] . |lo] . One can also obtain moment bounds for v in terms of various 
multilinear integrals involving ip; see jT7], [TH], ^1] for some computations of this sort. 
The density at infinity, a^, is missing, because v extends to the negative integers as 
well as the positive ones. Note that as the order m of the correlation increases, the 
range of available R decreases, so if we set R equal to a fixed power of N, we only 
obtain correlations to finitely high order. 

In the language of |3Z|, |SH], the function Cv appearing in ()2.1Hj) is an enveloping sieve 
for the von Mangoldt function A. Results such as Theorem 12 .HI establish correlation 
estimates for this sieve, which in turn automatically imply upper bounds for expressions 
such as (jl.4j) which are off by a constant C mjt > 1; thus the enveloping sieve can be 
used as an upper bound sieve, though it has many other uses also, thanks in large part 
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to correlation estimates such as 12 Theorem 12.31 More advanced methods in sieve theory 
can of course be used to reduce this loss C m>t , although the parity problem prevents one 
from removing this constant entirely by sieve-theoretic methods. 

We have asserted earlier that v is concentrated on the almost primes, which are 
coprime to all numbers less than R. Let us provide some further evidence of this claim. 
From (|2.1()j) we have 

logrf r md -(wt)/io gR(lt 

lOgR J-oo 

and hence by (|2.7J1 

/oo 
mJ2^ d ~ {1+lt)/losR dL 

d\n 

We can factorize the sum as an Euler product 

Y,Kd)d- {1+it),XogR dt = £(1 -p-M/iog*) 



and conclude 



d\n p\n 

/oo 
^(t)[J(l-f (1+! ' )/l06fi ) dt 
-OO i 

and similarly by (J2.12)) 



p\n 



u(n) 



/oo poo 
/ JJ(1 -p-M/log*)^ _ p -(l+i*')/logii) dtdt'. 

-OO J —OO I 

p\n 

Since ip(t) is rapidly decreasing, the integral effectively localizes t to be close to 1. The 
factor (1 — p _ ( 1+l ')/ 1 °s i? ) is then close to when p <^ R and oscillates around 1 when 
p ^> R. Thus we expect An ttp (n) and v(n) to be small when n has one or more prime 
factors <C R, and these quantities should be close to log R when n is a product of primes 
^> R, though in some exceptional cases (when the phases of p-(. l + lt )/ lo & R align in an 
unfavourable way) one may expect K R ^{n) to be somewhat larger than this 13 . Thus we 
have the rough heuristics 

A(n) » (logiV)lp; u{n) & {log R)1 AP (2.19) 

for n ~ N, where P denotes the primes up to iV and AP denotes the almost primes at 
level R up to iV (i.e. the products of primes larger than R). Observe that A(n) and 
u(n) both have average 1 + ojv-k3o(1)> which thus suggests that P has density about 
inside AP; one can obtain more precise estimates here using Buchstab's formula. On 
the other hand, Theorem 12.31 combined with the heuristic 1)2.19)1 suggests that the set 
AP is very nicely distributed if R = N e for some suitably small e. Thus, in summary, 
the primes P form a set of positive density (~ e) inside the almost primes at level 

12 By modifying the enveloping sieve slightly, one can also get some useful estimates on the Fourier 
coefficients of v, see Of course, similar estimates are also known for the Fourier coefficients of 

A itself, though the estimates for v are simpler and do not require the theory of Siegel zeroes. In 
particular, the estimates arc effective without requiring strong hypotheses such as GRH. 

13 On the other hand, (|2.7)l shows that A^ ip (n) can be crudely bounded by O v (r{ri) log R), where 
r(n) = X)d|n ^ ^ s ^ ne divisor function. As is well known, the divisor function has size 0(logn) on the 
average, though it can get significantly larger than this for very smooth n. However, it is always O e (n E ) 
for any e > 0, and hence and v also have this type of bound. 
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R = N £ , and the latter set has a well-controlled distribution. This turns out to be a 
very useful perspective for a number of problems, as it bypasses the difficulty that the 
primes have only a density of = ojv-kx>(1) with respect to the integers {1, . . . , iV}. 
Thus the almost primes AP (or more precisely the enveloping sieve v) forms a better 
majorant for the primes P (or more precisely the von Mangoldt function A) than the 
integers (or the constant 1). 

3. The W-trick 

As we have seen, any correlation estimate involving A or v will involve a number 
of local densities a p ; these densities ultimately arise from the fact that the projections 
Az/pz of A to the residue classes modulo p are not constant. Note that due to the 
rapid convergence of the product Yl p a pi h is only the small divisors p for which this 
non-uniformity is significant. However, if one does not care much about the exact order 
of decay in the o(l) errors, then there is a cheap trick, which we call the 'W-trick", 
available to essentially eliminate the role of these local factors, so that one only has to 
deal with functions which are uniform with respect to small divisors. 

This trick works as follows. We introduce a new parameter 1 <C w <C N; this will 
eventually be set to a very slowly growing function of N, such as log log N, although 
for the purposes of getting qualitative o(l) bounds it is not particularly important what 
w is. We let W := rip^P be the product of all the primes less than w. The prime 
numbers larger than w will then be distributed in the residue classes {Wn + b : n 6 Z}, 
where b is one of the 4>{W) numbers in {1, . . . , W} which are coprime to W. For each 
of these numbers b, we introduce the renormalized von Mangoldt function 

W 

and similarly the renormalized truncated von Mangoldt functions 

W 

Ai?^,6( m odwo( n ) := ^w) AR '^ Wn + b ' 
and the renormalized enveloping sieve 

W 

Vb(modW)(n) ■= J(y^ u ( Wn + & )' 

Then the functions A fe ( mo d w \ (n) behave like A except that the projections modulo q 
are now extremely close to 1 for small q. Indeed from (jO)) and the Chinese remainder 
theorem, one easily verifies 

E(A 6(modw) (n)|l < n ^ N;n = a(modg)) = 1 + o^_ foo;w (l) 

for 1 ^ q ^ w. The analogue of Conjecture II. II is then the assertion that 

rn 

E(JI K bi {iJi{x))\x e {1, . . . , NY) = aoo (N) If ot P + o N (1) (3.1) 

i=l p>w 

whenever 6 1( . . . , b m G {1, . . . , W} are coprime to W; thus the local factors correspond- 
ing to primes less than or equal to w in (jl.4j) are eliminated, at the cost of letting the 
o(l) term depend on w. Actually it is not hard to see that (jl.4)l is in fact equivalent to 
()3.1|) . In many cases, the remaining local factor Yl p > w a v ^ s * n ^ ac ^ 1 + 0W-*oo;m,t,z(l)) 
for instance, this is the case if no two of the linear parts (Ai)i<j<* °f the affine forms 
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■01, ... , i\) m are not rational multiples of each other. However, there are some important 
cases where the remaining local factors are significant. For instance, for 1 ^ a ^ N, the 
prime tuples conjecture predicts 

E(A b (x)A b (x + a)\l < x ^ N) = TT (1 + -)(1 + o^^l) + 

p 

p>w.p\a 

The expression r(a) := n p >to P |a(^ + p) * s sman f° r rnost a, for instance one can establish 
the moment estimates 

E(r q (a)\l ^ a ^ N) = 0,(1) (3.2) 

for all 1 ^ q < oo (indeed one can refine the right-hand side to 1 + o w ^oo- q (l)) . However 
it is not bounded, as can be seen by taking n to be the product of a large number of 
primes, each of which is slightly larger than w. Nevertheless it is a good heuristic to 
view rip>ji) Q; p as being close to 1 for "most" choices of forms ipi. 

Similar considerations apply to the enveloping sieve v. For instance, one can establish 
that 

m 

(1) (3.3) 

i=l 

whenever no two linear parts of the affine forms ipi, . . . ,ip m ; this is essentially 1 the 
linear forms condition verified in [2E1 Proposition 9.8]. Similarly, one can show 15 

m 

E(\Jv bi (x + ai )\x e {1,...,NY) < f ( a i- a i) ( 3 - 4 ) 

where f : 7L — > M + is a slight variant of r which is even and obeys the moment condi- 
tions ()3.2|) ; this is essentially the correlation condition verified in [2H Proposition 9.10]. 
Morally, one should think of the right-hand side of ()3.4j) as being bounded, with only a 
few exceptions such as when cij — dj is zero or very smooth (contains a large number of 
prime factors larger than w). 

The linear forms condition ()3.3j) is an assertion that the v b are distributed pseudoran- 
domly throughout {1, . . . , iV}; more informally, the almost primes AP when restricted to 
a coset {Wn + b :n6Z} with b coprime to W, behave pseudorandomly inside each such 
coset. This is consistent with the heuristics used to support the Hardy-Littlewood prime 
tuples conjecture, such as Cramer's probabilistic model for the primes. In this context, a 
useful probabilistic model for u b (n) would be a function which equalled -^^logR with 

probability ^^ logR independently for each n, and equalled otherwise. The prime 
tuples conjecture then asserts that the also behave in a similarly pseudorandom 
manner (but with logi? essentially replaced by logiV). 

The linear forms condition ()3.3j) shows that the correlations of v b are very close to 
the correlations of the constant function 1, thus v b is close to 1 in a "weak" sense. One 
of the philosophies underlying the work in is a transference principle which asserts, 
informally, that many results which are true for functions bounded by constant function 

14 The conditions verified in actually refer to a version of v b adapted to Z/iVZ rather than 
{1, . . . , iV}, but the distinction between the two is rather minor. 

15 The diagonal cases = aj can be treated using the crude bound v(n) = O e {N £ ) for any e > 
and n = 0(N). 
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1, are likely to extend to functions bounded by pseudorandom functions such as Ub, or 
variants such as Vf, + 1. 

Any counting problem concerning the von Mangoldt function A can of course be 
subdivided into a counting problem involving the For instance, suppose one wanted 
to establish a bound such as 

E(A(a) . . . A(a + (k - l)r) |1 < a, r < N) ^ c k - o^^l) 

for all k ^ 1 and N ^ 1, and > 0; this bound is in fact obtained in and implies 
that the primes contain arbitrarily long arithmetic progressions. In order to achieve 
this bound, it suffices to show that for all w there exist b G {1, . . . , W} coprime to W 
such that 

E(A 6 (a) . . . A 6 (a + (k - l)r)|l < a, r ^ AT) ^ c' fc - o^^l) - o^oo^U 1 ) (3.5) 

for some other c' k > 0. Indeed, if such a bound were true, it would imply that 

E(A b (a) . . . A b (a + (k - l)r) 1 1 ^ a, r < N) ^ 4/2 

(say) whenever w was sufficiently large depending on k, and A was sufficiently large 
depending on w and k. But since A& is a renormalized component of A using the affine- 
linear transformation n \— > P^ra + 6 (which preserves arithmetic progressions), we then 
observe that 

E(A(a)...A(a + (Jfe-l)r)|l ^a,r^N) ^ c kyW 

for some Ck )W > 0. Fixing w to be a suitably large constant depending only on k, we 
obtain the claim. 

This reduction from A to A& is used in [26^. Indeed, (13.5)1 is established for all 
1 ^ b < W which are coprime to W. In the proof, the only facts needed are the 
bounds < A fe ^ Cu b (which is inherited from (J2H3J)) and E(A 6 (n)|l < n < N) > 
c — oat^ 00;U) (1) (which comes from ()1.3p ). In fact, since we only need to establish (|3.5j) 
for a single b, it is possible to avoid using Dirichlet's theorem altogether, and simply use 
the pigeonhole principle to locate a b for which Aj,(n) has large mean. This observation 
has the interesting application that it allows one to extend the result in [2E] to obtain 
arbitrarily long progressions, not just in the primes, but in fact in any subset of the 
primes (or almost primes) of positive relative density. 

In summary, the Vy-trick allows one to easily eliminate the influence of small divisors, 
resulting in functions A&, v b which are much more uniformly distributed than their non- 
renormalized counterparts A, v. Of course, the price one pays for doing so is that the o(l) 
error terms, as well as the bounds employed above, deteriorate rather substantially; 
however if one is only interested in qualitative results then this trick is essentially cost- 
free. 

4. Fourier obstructions to uniformity 

We now discuss the problem of counting the progressions of length three in the primes. 
This can of course be done by the circle method, and this is essentially what we do here, 
but we shall adopt the philosophy of counting progressions by first establishing what 
the obstructions are to uniformity, and then dealing with these obstructions in some 
manner. The P^-trick is already one way to eliminate one obstruction to uniformity, 
namely irregular distribution when localized to small primes, which in the language of 
the circle method allows one to ignore the contribution of the major arcs (except the 
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major arc near 1). We will see other ways to deal with obstructions to uniformity later 
in this article. 

The standard way to count progressions of length three in the primes is to try to 
obtain asymptotics, or at least bounds, for the average 

E(A(a)A(a + r)A(a + 2r)|l ^ a,r ^ N). (4.1) 

Indeed Conjecture 11.11 already predicts an explicit asymptotic for this quantity, and 
Theorem 12.31 gives an upper bound which is only off by an absolute constant. One would 
then use the Fourier transform right away, to convert this expression to an integral 
involving an exponential sum such as E(A(n)e(— not) |1 ^ n ^ N), where a is a real 
number and e(x) := e 2mx . This sum would then be estimated in two different ways, 
one when a is major arc (close to a rational with small denominator) and one when a 
is minor arc. The minor arc computation is reasonably elementary (ultimately relying 
on variants of the identity ()2.6j) . the Cauchy-Schwarz inequality, and some bilinear 
cancellation in the expression e(—nma)) but the major arc computation is somewhat 
deeper, relying among other things on the Siegel-Walfisz theorem. 

It turns out that one can proceed in a more elementary fashion if one is not seeking an 
asymptotic, but only a non-zero lower bound on the quantity (j4.1j) (which will certainly 
be enough to imply the qualitative result that there are infinitely many progressions of 
length three in the primes). Instead of needing to control the exponential sums of A, 
one only needs to control the exponential sums of a majorant v or which is much 
simpler. However, one does need one additional ingredient, namely Roth's theorem [41J. 
Roth's original formulation of this theorem asserts that any subset of the integers with 
positive upper density, necessarily contains infinitely many progressions of length three. 
Varnavides [20] showed that this qualitative version is in fact equivalent to the following 
more quantitative statement: 

Theorem 4.1 (Quantitative Roth theorem). jHJ^jSH] Let f : Z/NZ — > M. be a function 
such that ^ f(n) «C 1 for all n £ Z/NZ, and such that E(f(n)\n G Z/NZ) ^ 5 for 
some < 5 < 1. Then we have 

E(f(a)f(a + r)f(a + 2r)\a,r 6 Z/NZ) ^ c(8) 

for some c(S) > 0. 

The best value of c(5) currently known is c(5) ^> S c ^ 2 for some absolute constant C, 
see jHJ. However for the qualitative arguments we give below, we do not need to know 
the exact value of c(S). We also do not need to know the proof of Theorem 14.11 we may 
treat it as a "black box" . We do remark however that the known proofs of this theorem, 
involving either Fourier analysis, ergodic theory, or graph theory, are extremely instruc- 
tive and are very consistent with the philosophy outlined here of detecting obstructions 
to uniformity and then somehow dealing with each of the obstructions which occur. For 
us, the power of Roth's theorem lies in the fact that very little structural information 
is demanded of / (in particular, no arithmetic structure or Fourier-analytic structure is 
required), besides the important constraint that / is bounded 16 . 

16 Indeed, our entire philosophy here is in some sense the polar opposite of the more conventional 
approach, in which one builds up as much information about the primes (or any other number-theoretic 
object) as possible, for instance using deep estimates on Dirichlet L-functions, and then uses all this 
information to then attack quantities such as Ij4.1|l . In contrast, we adopt a minimalist approach (in the 
spirit of sieve theory) in which we treat the primes as nothing more than a generic subset of the almost 
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At present, Roth's theorem does not directly allow us to obtain any non-trivial lower 
bound on ()4.1|) for two reasons. The first (rather trivial) reason is that we have stated 
Roth's theorem in Z/NZ rather than on {1, ... , iV}, but there are some easy truncation 
arguments (which we omit) to pass back and forth between these two settings, possibly 
after modifying iV by a factor of 2 or so. The more serious difficulty is that A is not 
bounded, and if we do normalize A to be bounded (e.g. by dividing by logN) then S 
becomes too small for Roth's theorem to be of any use. However, as it turns out it is 
relatively easy to decompose A into a bounded function (for which Roth's theorem is 
applicable) and a "uniform" error (which has a negligible impact on (JO}). 

Before we do this, we need to understand exactly what type of functions will give a 
negligible impact to expressions such as (j4.1j) . To phrase things a little more concretely, 
let us work in the cyclic group Z/NZ instead of the progression {1, . . . , A^}, taking A" 
to be odd, and consider an expression such as 

E(f(a)g(a + r)h(a + 2r)\a, r G Z/NZ) (4.2) 

for some functions f,g,h: Z/NZ — > C. To begin the discussion let us take /, g, h to be 
bounded in magnitude by 1, although for applications to the primes we will eventually 
need to discard this hypothesis. 

Since /, g, h are bounded by 1, it is clear that ()4.2|) is also bounded in magnitude by 
1. However, in many cases, ()4.2|) will be much smaller than 1. For instance, if one of 
f,g, h is small in some averaged sense, say if the L 1 norm E(|/(n)||n G Z/NZ) is small, 
then ()4.2j) will be small also. Also, if one of /, g, h fluctuates randomly, for instance if 
f(n) = ±1 for each n, with each f(n) attaining +1 or —1 independently with equal 
probability, then it is easy to see that ()4.2|) will be quite small with high probability. Let 
us informally call a function linearly uniform 17 if the expression ()4.2|) is necessarily small 
as soon as at least one of /, g, h is set equal to this function. Thus for instance functions 
with small L 1 norm, or randomly fluctuating functions, will be linearly uniform. Since 
()4.2|) is linear in /, g, and h separately, we thus see that we can modify /, g, or h by a 
linearly uniform function without significantly affecting ([4.2)1 . and so linearly uniform 
functions are "negligible" for the purposes of counting progressions of length three. On 
the other hand, from the identity 

E{e{aa)e{-2a{a + r))e{a{a + 2r))\a,r G Z/NZ) = 1 

for any a G jjZ, we see that the function n i— > e(cm) is not linearly uniform. More 
generally, since 

E(f(a)e(-2a(a + r))e(a(a + 2r))\a,r G Z/NZ) = E(f(n)e(-an)\n G Z/NZ) 

we see that any function / which has a large correlation (inner product) with a linear 
phase function e(cm), will not be linearly uniform. Thus linear phase functions are 

primes with positive relative density, ignoring all the rich arithmetic structure. That this approach 
works at all, is entirely due to the existence of such theorems as Roth's theorem, which apply to all 
sets of positive density (or bounded functions with large mean). However, as we shall see later it is 
possible to blend the two approaches and use deeper facts about the primes to obtain sharper results. 

17 The notation here is due to Gowers |2l)j . The term "uniform" arises because linearly uniform 
functions behave like a signed probabilistic point process with the uniform distribution; another pos- 
sible terminology is "linearly unbiased" . Somewhat confusingly, this usage of the word "uniform" is 
completely different from, and in fact in opposition to, the notion of "uniformly bounded" ; indeed, we 
will later need to rely crucially on the fact that linearly uniform functions can be very far from being 
uniformly bounded. 
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obstructions to linear uniformity; this may help explain the "linear" in the terminology 
"linear uniformity" . 

The effectiveness of the circle method, at least for the task of counting progressions 
of length three, ultimately lies in the fact that linear phase functions are the only ob- 
structions to linear uniformity, at least when everything is bounded; thus if a bounded 
function has small correlation with every linear phase function, then it is linearly uni- 
form. More precisely: 

Lemma 4.2. Let f,g,h : Z/NZ — > C be functions bounded by 1, and suppose that 

\E(f(n)e(-£n/N)\n G Z/NZ)\ ^ e 
for some e > and all £ G Z/NZ. Then we have 

\E(f(a)g(a + r)h(a + 2r)\a,r G Z/iVZ) | < e. 

Not co- incidentally, Lemma l4~2l is also the first step used in the Fourier-analytic proof 
of Roth's theorem; however, we will not discuss this connection here. 
Proof. Writing /(£) := E(f(n)e(-£n/N)\n G Z/NZ) for all a G Z/NZ, and similarly 
for g and h, we have the Fourier inversion formulae 

/(«)= E KOe^a/N); 

tez/NZ 

g(a + r) = £ g(X)e(X(a + r)/N); 

X&Z/NZ 

h(a + 2r) = ^ Hv)e(v(a + 2r)/N). 

rjeZ/NZ 

Substituting these formulae and simplifying, we eventually obtain the identity 

E(f(a)g(a + r)h(a + 2r)\a,reZ/NZ) = £ /(OtfMOMO- (4-3) 

i&Z/NZ 

On the other hand, from Plancherel's identity and the boundedness of g and h we have 

E iK-2or^i; E Wor<i 

tez/NZ ^ez/NZ 

while from the hypothesis on / we have |/(£)| ^ £ for all £. The claim then follows 
from Holder's inequality. □ 
Now we return to the task of estimating (|4.1j) . Applying the H^-trick to make A more 
uniformly distributed, it will suffice to obtain an estimate of the form 

E(A 6 (a) A 6 (a + r)A b (a + 2r) 1 1 < a, r < iV) ^ c - ow-oo(l) - 0^00^(1) 

for some absolute constant c > 0. Let us cheat a little bit by identifying {1, . . . , N} with 
Z/NZ (ignoring issues of truncation and wraparound, which are actually not difficult 
to deal with), so that we are now faced with establishing a lower bound for 

E{A b {a)A b {a + r)A b {a + 2r)\a,r G Z/NZ). (4.4) 

We would like to use Lemma (4.21 to strip away the linearly uniform components of A b . 
However, we are faced with the difficulty that A b is not uniformly bounded. Fortunately, 
we can use the fact that A b is majorized by an enveloping sieve v b . Actually we will not 
quite use the enveloping sieve v b constructed in the previous section, but use a slight 
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variant which is closely related to the Selberg sieve. The enveloping sieve z/& can be 
written down explicitly, but it is a little messy; see [27] for a definition, together with a 
full analysis and comparison of these two enveloping sieves. For this expository paper, 
suffice it to say that we still have the basic majorization 

^ A b «C Cv h (4.5) 

and that the Fourier coefficients of the Selberg enveloping sieve v\, can be computed 
very explicitly; for instance one can show that 

h(0 = Ovy_oo(l) + ON^oo-wil) (4.6) 

for all £ G Z/NZ\{0}. Using this and other bounds, together with orthogonality argu- 
ments such as those used in the large sieve (or of Tomas-Stein restriction theory), it is 
possible to obtain a weighted form of the Plancherel theorem, namely that 

\\f\\ip(z/NZ) < P 1 (4.7) 

whenever p > 2 and / : Z/iVZ — > C is bounded point wise by Pf, + 1; see (2TJ (and also 
[23 ). The key point in these estimates is that no factor of log A" appears on the right- 
hand side, despite the fact that all the L q moments of A and v (except the L 1 moment) 
contains such a logarithmic factor. Using this estimate we can obtain a weighted variant 
of Lemma 14.21 

Lemma 4.3. [2Zj Let f,g,h: Z/NZ C be functions bounded in magnitude by v\, + 1, 
and suppose that 

\E(f(n)e(-£n/N)\n G Z/NZ)\ ^ e 
for some e > and all £ G Z/NZ. Then we have 

\E(f(a)g(a + r)h(a + 2r)\a,r G Z/NZ)\ < e 1/2 . 
Proof. From (|4.3jl and Holder's inequality we have 

\E(f(a)g(a+r)h(a+2r)\a,r G Z/NZ)\ < \\f\\)£ {Ij / NIj) \\f\\)l% wm UWi^z/NZ) \\h\\i*/*(z/NZ) 
(for instance). From hypothesis we have ||/||z°°(z/ivz) ^ £■ The claim now follows from 

(EH). □ 

Thus, even when considering functions that are merely bounded by + 1 instead of 
bounded by 1, it is still the case that linear phase functions are the only obstruction to 
orthogonality. One can view this as a weak version of Plancherel's theorem, transferred 
to the enveloping sieve v h + 1. 

At this point one could try to show that A&, or more precisely the normalized function 
Aft — 1, has small correlation with all linear phase functions, 

E((A 6 (n) - \)e{-injN)\n G ZjNZ) = o^^l) + o N ^ w (l). 

This, together with Lemma 14 .3[ would imply that Aft can be replaced with 1 with 
negligible error in (|4.4jl and we would conclude that 

E(Aft(a)Aft(a + r)Aft(a + 2r)|a,r G Z/NZ) = 1 + ow^l) + oj V _ >oo;W r(l), 

which would of course be consistent with the Hardy-Littlewood prime tuples conjecture. 
This strategy can indeed be carried out, though it requires a Vinogradov-type analysis of 
exponential sums; it also gives the correct asymptotic for ()4.1j) . Indeed, this is essentially 
the approach taken by van der Corput when establishing infinitely many progressions of 
length three in the primes. However, there is a more "low-tech" approach that will give 
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the same qualitative result (but not the asymptotic). Roughly speaking 18 , the idea is as 
follows. We allow for the possibility that exponential sums ~E(A b (n)e(— an)\n G Z/7VZ) 
could be large, thus providing some additional obstructions to uniformity. However, the 
estimate (|4.7j) limits the total number of obstructions that could exist. More precisely, 
if we introduce a threshold < e < 1 and let S C Z/iVZ denote the exceptional 
frequencies £ which obstruct linear uniformity, in the sense that 

|E(A 6 (n)e(-£n/i\0|n G Z/JVZ)| > e, 

then (|4.7|) shows that [ jS' | <C e 1. The Vinogradov exponential sum technique will even- 
tually show that S consists only of the zero frequency for W, N large enough, but 
we will avoid using this fact, instead treating S as a set for which the only information 
known is the cardinality bound. This approach has the advantage of being more flexible, 
for instance we will also be able to recover the result of Green [23J that any subset of 
the primes with positive relative density contains infinitely many progressions of length 
three. 

The set S represents all the obstructions to uniformity. We can remove these obstruc- 
tions by the device of conditional expectation, which is a slightly different way than the 
VF-trick of removing non- uniformities, though certainly in the same philosophical spirit. 
One considers the Bohr set B(S,p) C Z/iVZ for some small radius < p < 1 defined 
by 

B(S, p) := {n G Z/iVZ : |K||r/z < p for all £ G S}, 

where ||x||r/z denotes the distance from x to the nearest integer. One should think 
of this Bohr set as being roughly analogous to the subgroup WL of Z, thus translates 
x+B(S, p) are the analogues of residue classes modulo W . When executing the VT-trick, 
we passed to a single residue class; here, however, we shall proceed in a more "ergodic" 
fashion, averaging out the effect of each translate x + B(S, p). More precisely we split 

Ab = A 6)i7 x + A biU 

where A btU ± is the "anti-linearly-uniform" component 

. N N 

Ab > u±{x) '= Ab > ux * ]BjsJj\ lBis < p) * jscMf lB{s ' p){x) 

where the convolution / * g on Z^r is defined by 

/ * g(x) := E(f(n)g(x - n)\n G Z/JVZ), 
and Ab t u{x) is the "linearly uniform component" 

Ab,u '■= A b — A b jj±. 

The function A b v ± encapsulates all the obstructions to linear uniformity encountered 
by A b ; the convolution kernel 

N N 

K := llfMi lfl(s ^ * \bJsJ)\ 1b{s > p) 

can be thought of as a sort of "Fejer kernel" adapted to B(S,p). A key observation is 
that unlike Aft, the function A b u± is bounded. Indeed, from the majorization ()4.5)1 we 
have 

< A bU ±(x) < h * K(x) 



18 



For the detailed rigourous argument, see \27\. 
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and then by using Fourier expansion of 1b(s,p) an d ()4.6|) one can show 

V b * K(x) < 1 + W ^oo;\S\A l ) + °N^aoW\S\,p{ 1 )- 

Since \S\ -C e 1, we thus have the uniform boundedness 

^ A bjU x(x) < 1 + 0^oo; £ ,p(l) + Oj V _ > oo;W r ,e,p(l)- ( 4 -8) 

In particular we see that A^u is pointwise bounded by a constant multiple of v b + 1. 
Also, since the kernel is normalized to have mean 1, we have 

E(A biU ±(x)\x G Z/NZ) = E(A b (x)\x G Z/ATZ) = 1 + o W -+oo(l) + o A r-, oo; w(l)- 

Thus A bU ± is bounded, non-negative and has large mean, and so Roth's theorem can 
be applied (after a renormalization by a bounded scalar) to conclude 

E(A biU ±(a)A btU ±(a + r)A b)U ±(a + 2r)\a,r G Z/7VZ) ^ 0-0^0^(1) - o 7 v_ >00;W , e , P (l) 

(4.9) 

for some absolute constant c > 0. 

The function A^y can be regarded as the portion of A^ remaining after all the ob- 
structions to uniformity have been removed. By the definition of S, one can easily show 
that A b u± has small correlation with all linear phase functions: 

\E(A biU (n)e(-£n/N)\n G Z/NZ)\ < e + p for all f G Z/NZ, 

and thus by several applications of Lemma f4. 31 we can replace A b by A bU ± with a small 
error: 

E(A b (a)A b (a + r)A b (a + 2r)\a, r G Z/AfZ) 

= E(A feiC7 x(a)A 6iC7 x(a + r)A b>{/ ±(a + 2r)\a,r G Z/ATZ) + 0(e + p). 

Applying ()4.9|1 we conclude that 

E(A b {a)A b {a + r)A b {a + 2r)\a, r G Z/ATZ) ^ c/2 

if £, p are sufficiently small, W is sufficiently large depending on e, p, and N is sufficiently 
large depending on e, p,W. This is enough to establish infinitely arithmetic progressions 
of length threein the primes, and more generally'in any subset of the primes with positive 
relative density. Similar arguments work for other sets that are fairly large and which 
can be dominated by a suitable enveloping sieve. For instance, in it was shown 
that there were infinitely many arithmetic progressions pi,p2,p% in the primes, where 
the numbers p± + 2, pi + 2, p% + 2 are either prime or the product of two primes; this is 
achieved by combining the arguments above with (a quantitative version of) the famous 
result of Chen jH] that there are infinitely many primes p such that p + 2 is the product 
of at most two primes. 

5. Quadratic obstructions to uniformity 

Let us now consider the task of counting progressions of length four in the primes, or 
more precisely of obtaining an asymptotic for 

E(A(a)A(a + r)A(a + 2r)A(a + 3r)|l ^ a,r ^ N). 

The Hardy-Littlewood prime tuples conjecture predicts that this quantity is equal to 
llp^p + °jv->oo(1)j where a p is the local density 

a p := E(A z/pZ (a)A z/pI ,(a + r)A z/pZ (a + 2r)A z/pZ (a + 3r)\a,r G Z/pZ). 
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To put it another way, the number of progressions a, a + r, a + 2r, a + 3r of primes with 
1 ^ a, r ^ iV is predicted to be j~rjy (IIp a v + jv-k»(1))- The result of [2E. establishes 
a lower bound 

E(A(a)A(a + r)A(a + 2r)A(a + 3r) 1 1 ^ a, r ^ A) ^ c - o^ooO) 

for some absolute constant c > 0, which is enough to establish infinitely many progres- 
sions of length four in the primes, but does not give the asymptotic. In this section we 
describe a more recent (though significantly more complicated) approach in |28j . [29J, 
(HOI which will give the correct asymptotic: 

Theorem 5.1. |2H!, [22], j3D| We have 

E(A(a)A(a + r)A(a + 2r)A(a + 3r)|l ^ a, r < A) = JJa p + 0^00(1). 

p 

We now sketch the main ideas of proof of this theorem. Firstly, by the VF-trick, it 
will suffice to show that 

E(A bo (a)A bl (a + r)A b2 (a + 2r)A h (a + 3r)\l ^ a, r < A) = 1 + ow^l) + o N ^ w (l) 

for all bo, . . . , 63 coprime to W . Let us again cheat a little bit by identifying {1, . . . , N} 
with Z/AZ (ignoring some minor truncation issues), so that we now wish to prove that 

E(A 6o (a)A 6l (a + r)A b2 (a + 2r)A bs (a + 3r)\a,r G Z/AZ) = 1 + o^^l) + 0x^.^(1). 

(5.1) 

It is convenient to take A to be a prime. We are thus faced with the problem of 
understanding quartilinear expressions such as 

E(f(a)g(a + r)h(a + 2r)j(a + 3r)\a,r e Z/AZ); (5.2) 

to begin the discussion let us suppose that f,g,h,j are bounded in magnitude by 1. 
Let us informally call a function quadratically uniform if the above expression is au- 
tomatically small whenever one of f,g,h,j is replaced with that function. As in the 
preceding section, it is easy to see that linear phase functions obstruct quadratic unifor- 
mity; however, a new difficulty arises in that quadratic phase functions such as e(an 2 ) 
also obstruct quadratic uniformity. This can be seen for instance by the identity 

E(/(o)e(-3a(a + r) 2 )e(3a(a + 2r) 2 )e(-a(a + 3r) 2 )|a, r G Z/AZ) 
= E(f(n)e(-an 2 )\n G Z/AZ). 

More generally, one can show that any quadratic nilsequence of the form F(g n x), where 
g G G lives in a 2-step nilpotent Lie group G, x lives in a compact quotient 19 G/Y of 
G by a closed subgroup Y, and F : G/Y — > C is a continuous function, will similarly 
be an obstruction to quadratic uniformity; see [2H]- The quadratic phases e(an 2 ) are 
good examples of quadratic nilsequence; another example is the generalized quadratic 
phase e(|_cmj |_/?nj7) for some real numbers a, [3, 7, though strictly speakign one needs 

19 There is an intriguing superficial similarity between the emergence of the 2-step nilmanifolds 
G/Y which arise in the analysis of progressions of length 4, and the cusp manifolds 51,2 (K)/r which 
appear for instance in Kloosterman's refinement of the Hardy-Littlewood circle method (which of course 
corresponds to the unit circle R/Z). However, we do not know of a concrete connection between these 
two different extensions of the circle method. 



22 TERENCE TAO 

to smooth out the greatest integer function \_x\ in order to genuinely obtain a quadratic 
nilsequence. 

The appearance of these quadratic phases shows that the circle method is now insuf- 
ficient to establish quadratic uniformity; functions such as e(an 2 ) can give significant 
contributions to (|5.2j) despite having very small Fourier coefficients. However, qua- 
dratic uniformity can still be captured by the very useful Gowers uniformity norms 20 
U d (Z/NZ), defined recursively for d = 0, 1, ... as 

\\f\\u°(z/NZ) ■= W(x)\x E Z/JVZ); \\f\\v* +HZ/NZ) = E(\\T h fJ\C {z/m \h E Z/NZ) 1 ' 2 " 
where T h is the shift operator T h f(x) := f(x + h), thus for instance 



\E(f(n)f(n + h)\n, h E Z/NZ)\ 1/2 
W)| 



uhz/nz) = \E(f(n)f(n + hi)f(n + h 2 )f(n + h x + h 2 )\n, h x , h 2 E Z/JVZ) | 

= ( £ i/(oi 4 ) 1/4 



(ez/NZ 



u^z/nz) = |E(/(n)/(n + /ii)/(n + h 2 )f(n + h 3 ) 

f(n + h + h 2 )f(n + h + h 3 )f(n + h 2 + h)f(n + h 1 + h 2 + h 3 ) 

\n,h u h 2 ,h 3 E Z/JVZ) |. 

The relationship between Gowers uniformity norms, and quadratic (or higher order) 
uniformity, is given by 

Lemma 5.2 (Generalized von Neumann theorem). Let k ^ 3 ; and let N ^ k — 1 be 
prime. If fo, . . . , fk-i '■ Z/JVZ — > C are bounded in magnitude by 1, then 

\E(f (a)f 1 (a + r)...f k - 1 {a+(k-l)r)\a 1 reZ/NZ)\ < inf ||£iU-i (z/iVZ) . 

In particular we have 

\E{f (a)f 1 {a + r)f 2 {a + 2r)h{a + 3r)\a,r E Z/JVZ) | < inf \\fi\\u*{z/NZ)- 

This lemma can be deduced from k — 1 applications of the Cauchy-Schwarz inequality, 
interspersed with k — 1 applications of the van der Corput identity 

|E(/(n)|n G Z/JVZ)| 2 = E(T h /(n)7(n)|n, h E Z/JVZ); 

we leave the details to the reader (or see |2Dj, (23], [Sj, H3, IH, [22], H3)- 

The above lemma shows that functions with small U 3 (Z/JVZ) norm are quadratically 
uniform. As before, this lemma is not directly applicable to the problem of finding 
progressions in primes, since functions such as A& are not bounded. However, because 
A& can be bounded by an enveloping sieve Vb which obeys the good correlation estimates 
in ()3.3|) . we can use the following extension of the generalized von Neumann theorem: 

Lemma 5.3 (Relative generalized von Neumann theorem). j2Sj Let k ^ 3 ; and let 

N > k — 1 be prime. If fo, . . . , f k -\ : Z/NZ — > C are such that fj is bounded by Uf,. + 1 
for some bj coprime to W , then (if R = N Ck for some sufficiently small Ck) 

|E(/ (a) • • ./fc_i(a+(fc-l)r)|a,r E Z/JVZ) | < fc inf \\f j \\ U k-i^/i^+o N ^ x . Wtk (l)+o w ^ 00 . k 
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These are genuine norms for d ^ 2; see EH- |25j . |46|. 
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This lemma is more complicated to prove than Lemma 15.21 but is still primarily an 
application of the Cauchy-Schwarz inequality; see 21 [26J, with a heavy reliance on the 
linear forms estimates ()3.3|) . Note that this generalization of Lemma 15.21 is consistent 
with the transference principle mentioned earlier. 

In light of this lemma, we see that in order to establish the asymptotic (JO}, it will 
suffice to show that — 1 is quadratically uniform, or more precisely that 

W^b — M\U 3 (Z/NZ) — On^oo;w(1) + Ovi^oo(l) (5.3) 

for all b coprime to W. This is not easy to do directly, since the quantity ||A& — 1\\uz(z/nz) 
is basically the same type of expression that appears in the Hardy-Littlewood prime 
tuples conjecture, and is beyond the reach of the circle method. Nevertheless, one can 
proceed by locating all the obstructions to quadratic uniformity, and then checking that 
the function — 1 is orthogonal to all of these. 

We have already observed that the quadratic nilsequences F(g n x) are obstructions to 
quadratic uniformity. Recent developments [33] , [5] in ergodic theory strongly suggest 22 
that these are in fact the only obstructions to quadratic uniformity. By building on 
the pioneering combinatorial and analytical technology of Gowers [20 , a quantitative 
version of this assertion was made in [2B]- More precisely: 

Theorem 5.4 (Inverse theorem for £/ 3 (Z/AfZ)). j2Hj Let < r\ < 1. Then there exists 
a collection M of O v (l) triples (G,Y,F), where G is a 2-step nilpotent Lie group, Y is 
a closed co-compact subgroup of G, and F : G/Y — > C is a smooth function, with the 
following property: if N is an odd prime and f : 7L/N7L — > C is bounded by 1 and is 
such that \\f\\u 3 (%/NZ)> then there exists a triple (G,Y,F) from this collection, a group 
element g G G, a point x G G/Y, and a shift h G Z/7VZ such that 

\E(T h f(n)F(g n x)\ - N/2 < n < N/2)\ >„ 1. 

One can explicitly describe the collection A/", and give quantitative bounds on the 
dimension of G/Y and the smoothness of F, as well as the dependence of the implied 
constant on r/; see |2*5] . 

The proof of Theorem 15.41 is quite lengthy, using many tools of Gowers in additive 
combinatorics and Fourier analysis. On the other hand, it may well be that a "softer" 
proof, without the quantitative bounds, is available by the ergodic-theory methods in 
[31], [E]- In [30], the results from j2E] (and more precisely, Theorem 16.21 below) were 
used to extend Theorem 15.41 to the case when / is merely bounded by v b + 1 rather than 
by 1; again, this is consistent with the transference principle. By applying this extended 
version of Theorem 15.41 we see that one can prove (j5.3|) as soon as one demonstrates 
the asymptotic orthogonality estimate 

E{{T h A b {n) - l)TWx)\ -N/2<n< N/2) = o N ^ w ^ GjV {l) + o w _ fooii r, G ,r(l) (5.4) 
for all quadratic nilsequences F(g n x). 

21 The argument in |2rij treats the case when all the bj are equal, but one can easily modify it to 
treat the case of distinct bj . 

22 Roughly speaking, the ergodic theory setting corresponds to considering averages such as 
E(/(o)/(a + r)f(a + 2r)/(a + 3r)|l ^ a ^ N, 1 < r < if) where the shift range H goes to infin- 
ity much more slowly than N does. As such, there does not appear to be a direct "correspondence 
principle" between the results in |$4j . [S] and the type of results considered here, but there is certainly 
a very strong analogy between the two. See [HH] for more on the ergodic theory perspective to these 
problems. 
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This type of result is essentially an exponential sum estimate on A, and can thus be 
attacked by the standard Vinogradov-type methods. A model case is the estimate 

E((A 6 (n) - l)e(-cm 2 )|l < n ^ N) = o^^l) 

for all a G R, which was essentially obtained in ^Hj- The general case of quadratic 
nilsequences is treated in |2H], |H0j- In those papers it is convenient to first prove the 
preliminary estimate 

E(fi(n)F(g n x)\l < n < N) <^ A ,F,G,r \og~ A N 

for all A > whenever F is smooth; see j^Dj. This can be considered a generalization 
of Davenport's estimate [TU] 

E(/x(n)e(-em)|l n < A 7 ) <a log~ A N 

and is proven by broadly similar, though significantly more technical, methods (in par- 
ticular, Vaughan's identity, a division into major and minor arcs, and Cauchy-Schwarz 
type arguments to deal with the minor arcs). It is however simpler to deal with the 
Mobius function /i(n) than the modified von Mangoldt function A&(n) — 1, as /i is 
bounded, and also obeys a somewhat more pleasant Vaughan identity than A. Using 
this estimate and some elementary arguments, it is already possible to establish 

E{{T h A b {n)~T h A R ^ b {n))F{g n x)\-N/2 <n< N/2) = oa^oo^wW+^oojwW 

where A R ^ b {n) := -M^rA Rtip {Wn + b) and A R>ip was defined 23 in (|2.7|) ; as usual we set 
R to be a small power of A" and ip to be a suitable cutoff function. By the triangle 
inequality, it thus remains to verify that 

E((T h A R ^ b {n) - l)F{g n x)\ - N/2 < n < N/2) = on^w^gA 1 ) + o W ^oo;F,G,r(l). 

It turns out that the simplest way to do this is to apply the Cauchy-Schwarz inequality 
(in the spirit of Lemma (5.21 and Lemma EH and in particular on the Gowers- Cauchy- 
Schwarz inequality introduced in [21], and also playing a key role in [26 ), to reduce 
matters to the U 3 estimate 

||Afl )¥ ,,6(n) — 1\\u s (z/nz) = oat-^oo;w(1) + OW^oo(l), 

which in turn can be established by a Goldston-Yildinm correlation estimate, similar 
in spirit to ()3.3j) . See [30j . 

It is entirely possible that the techniques discussed in this section extend to give an 
asymptotic for longer progressions in the primes, though there are serious new difficulties 
that appear (similar to the new difficulties that appear in [21] when compared against 
[20 ). We (in joint work with Ben Green) hope to report on this problem in a future 
paper. 

6. Ergodic obstructions to uniformity 

In the previous section, we outlined a rather complicated approach that yielded an 
asymptotic for the number of progressions of length four in the primes. As we already 
saw though in the length three case, it can often be significantly easier to establish 
the weaker result of a non-trivial lower bound for the number of such progressions, us- 
ing tools such as Roth's theorem. This was achieved in [26J, in particular establishing 

23 Actually, any reasonable truncated divisor sum approximation to A could be used in place of Afl iV 
here. 
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that the primes contain arbitrarily long arithmetic progressions. The argument can 
be seen as a variant of the above arguments, but in which the "hard" obstructions of 
nilsequences are replaced by much "softer" obstructions coming from ergodic averages. 
These soft obstructions are insufficiently explicit to easily allow for establishing asymp- 
totic orthogonality results such as (|5.4jl . but they are still controllable to the extent that 
one can modify the arguments of Section HI using the soft obstructions to build gener- 
alized Bohr sets with which to split A& into a uniform component, which is negligible, 
and an anti- uniform component, which can be treated by a theorem of Szemeredi. 

We turn to the details. The famous theorem of Szemeredi j32] asserts that every 
subset of integers of positive density contains arbitrarily long arithmetic progressions. 
A quantitative version of this theorem, which generalizes Theorem 14.11 is as follows: 

Theorem 6.1 (Quantitative Szemeredi theorem). Let k ^ 1, and let f : Z/NZ — ► ffi. 
be a function such that ^ f(n) ^ 1 for all n G Z/NZ, and such that E(/(n)|n G 
Z/NZ) ^ 5 for some < 5 < 1. Then we have 



This theorem can be deduced from Szemeredi's original theorem from the averaging 
argument of Varnavides p2]| ; see also [H] for a direct proof. 

As in Section EJ the task (after applying the VF-trick) is to obtain a non-trivial lower 
bound for 



where we once again gloss over the distinction between Z/NZ and {1, . . . , N} to simplify 
the discussion. Again, we cannot apply Theorem 16. II directly because of the unbound- 
edness of A&. However, we can proceed by establishing the following structure theorem, 
that decomposes any non-negative function bounded by the enveloping sieve v h into 
a Gowers uniform component (with small Gowers uniformity norm), a non- negative 
bounded component, and a small error. 

Theorem 6.2 (Structure theorem). |2H] Let k ^ 1, and let R = N Ck for some suffi- 
ciently small Ck > 0. Let f : Z/A^Z — > R be such that ^ f(n) ^ Ufj(n). Let < e < 1. 
Then functions fu, f v ± : Z/NZ — > C such that 



E(f(a)f{a + r) . . . f{a + (k - l)r)\a, r G Z/NZ) > c{k,S) 



for some c(k, 5) > 0. 



E(A 6 (a) . . . A b (a + (k - l)r)\a, r G Z/NZ) 



(6.1) 



WfuWu^Cl/NZ) — £ -> 0;fe(l) 



(6.2) 



and 



< fu±(n) < 1 + o e _ ;fe(l) + o N . 



oo;e 




(6.3) 



and 




(6.4) 



and 



E(f v x(n)\n G Z/NZ) = E(f(n)\n G Z/NZ) + o £ ^ 0]k (l) 



(6.5) 
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Assuming this theorem, a lower bound for (jO} can be easily accomplished. By (14. 5|) 
we can apply Theorem 16 . 21 with / := cA^ for some absolute constant c > 0, to obtain a 
majorization 

< fu + fu± < cA b . 
It then suffices to obtain a lower bound for 

H(fu + fu±)(a) ...(fu + fu±){a + (k — l)r)\a, r G Z/NZ). 

All the terms involving at least one factor of fo are o e ^o;fe(l) + 0Ar-»oo;e,fc(l), thanks 
mainly to ()6.2|) and Lemma 15.31 The remaining term involving fox is at least c& — 
o e _>o;fc(l) — °JV-*oo;e,fe(l); thanks to Theorem 16. II and (|6.5|) . Setting e suitably small, and 
then N sufficiently large, we obtain a non-trivial lower bound for (|6.1|) . 

Thus Theorem 16.21 allows one to transfer Theorem 16. II to a relative setting, adapted 
to the enveloping sieve v\,. A similar argument also allows one to use Theorem 16.21 to 
transfer Theorem 15. HI to the relative setting; see [SHj. 

It remains to prove Theorem 16.21 Let us fix /. The first guess is to take fox to 
be the mean of /, fox := E(/), and then set fo '■— f — fu ± - It is clear that fox is 
non-negative, and also 

fox = E(f) < E(l^) = 1 + W ^oo;fc(l) + OjV- + oo;W,fc(l)- 

Also we trivially have ()6.5|) and (|6.4|) . The only difficulty is that we do not necessarily 
have ()6.2|) : there is no reason why fu needs to be Gowers uniform (i.e. have small 
U k ~ 1 (Z/NZ) norm). However, if this is the case, it turns out to be possible to locate 
a precise obstruction which is preventing fu from being uniform, and transfer this 
obstruction from fu to fox. This may not remove all the non-uniformity from fu, but 
it will increase the energy (L 2 (Z/AZ) norm) of f v x by a significant amount, and so 
after iterating this process a finite number of times we will eventually end up with a 
Gowers uniform fo. 

The above type of argument has also been used before in ergodic theory (most no- 
tably in Furstenberg's structure theorem (H|), and also in the proof of the Szemeredi 
regularity lemma [13]; not co-incidentally, both of those cited papers concerned Sze- 
meredi's theorem (Theorem 16. lj) . The argument in Section 0] involving convolution with 
a Bohr set generated by all the Fourier obstructions to uniformity is also an argument 
of this type (although in that case one transferred all the obstructions from fu to fox at 
once, rather than one at a time). The main difficulty in executing the above idea is to 
maintain ()6.3|) throughout this procedure, i.e. to keep fox non-negative and bounded 
by 1 (plus negligible errors). To achieve the non-negativity, the simplest way is to use 
the machinery of conditional expectation (as is done in Furstenberg's structure theorem, 
and implicitly in the Szemeredi regularity lemma). To achieve the boundedness, one 
needs some control on the obstructions to uniformity that one is transferring to fox. 
In the Fourier- analytic argument, these obstructions are linear phase functions e(an), 
and one can use Fourier-analytic control in the enveloping sieve (see (I4.6jl ) to keep fox 
bounded. To adopt a similar argument in the general case, one might imagine one 
would need a similarly explicit description of these obstructions, for instance using the 
nilsequences of the preceding section. However, it turns out that one can get by using 
a much less explicit obstruction to uniformity, first introduced in ergodic theory 24 . 



More precisely, the key observation for ergodic theory is that the obstructions to weak mix- 
ing (which roughly corresponds to Gowers uniformity) are given by almost periodic functions, and 
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In order to make the above strategy rigourous, we need two basic concepts, that of a 
dual function and that of conditional expectation. The dual function D^f : Z/iVZ — ► C 
of a function / : Z/iVZ — > C is defined recursively for d — 0, 1, 2, ... by the formula 

D / = 1; 2W = E(V d (flV)T h f\h G Z/iVZ); 

thus for instance 
V 1 f(n) = E(/) 

P 2 /(n) = E(/(n + /i x )/(n + /i 2 )/(n + /ii + /i 2 )|^i, ^2 G Z/iVZ) 
= E((f,T h f)T h f(n)\heZ/NZ) 

= £ l/(0l 2 /(0eK/iv) 

P 3 /H = E(/(n + /n)/(n + /i 2 )/(n + fc 3 )/(n + ^ + h 2 )f(n + h x + fc 3 )/(ra + fc 2 + ^3) 
/(n + h 1 + h 2 + h 3 )\h 1} h 2 , h 3 G Z/iVZ) 

where (, ) denotes the usual inner product (/, g) = K(fg). One can easily use induction 
to verify that 

(f,-Dk-if)^\\f\\t- Hm zr (6-6) 
Thus if / fails to be Gowers uniform of order k — 1, it correlates with a dual function 
T>k-if ■ These dual functions will serve as our obstructions to Gowers uniformity; they 
are simple to describe but are not very explicit, as they involve a function / for which we 
have only limited control. Nevertheless, there is a large amount of averaging contained 
in the non- linear operator T>k-i, which will allow us to obtain satisfactory control on 
these dual functions. 

To proceed further, we need to understand the properties of dual functions better. 
The first important (and easy) property is that dual functions are always bounded: more 
precisely, we have \V k _if\ <^ k 1 whenever / is pointwise bounded by v h + 1. Indeed, in 
such a case we have 

and several applications of (jH.Hjl gives the bound Vk-iiyb + 1) (see |26j). 

The second important (but significantly deeper) property is that a dual function, and 
more generally any polynomial combination of dual functions, is highly "Gowers anti- 
uniform" in the sense that it is essentially orthogonal to all Gowers uniform functions, 
and in particular to the function v\> — 1 (which can easily be shown to be Gowers uniform, 
thanks to several applications of (|3.3)) ). Indeed, it turns out that we have 

(u b - 1, P(P fc _i(/i), . . . X>fc-i(/ m ))) = 0AT_ oo;mi p j vy(l) + Ow^oo^mA 1 ) ( 6 -7) 

for any polynomial P(x\, . . . , x m ) of m variables, and any functions fi,---,f m - Z/iVZ — > 
C bounded in magnitude by + 1. This fact is elementary to prove, but not entirely 
trivial; it is obtained by a large number of applications of the Cauchy-Schwarz and 
Holder inequalities, combined with the correlation condition f)3.4jl . See [27)] . 

more specifically given any function / which fails to be weakly mixing (so that {T h f 7 f) does not 
converge on average to zero), one can construct the non-trivial almost periodic function F := 
\imH^ 00 K((T h f,f)T h f \ — H < h < H), which has a positive correlation with /. See for instance 
jT4] ; for the connection with the Gowers uniformity norms see |34] . |35|. 
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One should compare the above facts with the situation in the Fourier-analytic ar- 
gument. In that argument, the role of dual functions was played by the linear phase 
functions e(an), which are certainly bounded. A polynomial combination of linear phase 
functions is nothing more than a trigonometric polynomial, and (|4.6J) then shows that 
v — 1 is indeed mostly orthogonal to such polynomial combinations. 

To exploit these facts about dual functions, we need to introduce the machinery of 
ex-algebras and conditional expectation. 

Definition 6.3. A a-algebra is a collection B of subsets of Z/iVZ which contains 
and Z/iVZ and is closed under union, intersection, and complementation. A function 
/ : Z/7VZ — > C is B-measurable if all its level sets lie in B. If B is a cr-algebra and 
/ : Z/iVZ — > C, we define the conditional expectation E(/|£>) : Z/iVZ — ► C of / with 
respect to B to be the function 

E(f\B)(x):=E(f\B(x)) = -^- /W 

for all x G Z/iVZ where B(x) is the smallest set in B which contains x. If i3i,£>2 are 
two cr-algebras, we use B\ V £>2 to denote the smallest cr-algebra which contains both B\ 
and £>2. 

A basic fact in measure theory is that any algebra of functions generates a cr-algebra. 
The estimate ()6.7|) asserts, morally speaking, that — 1 is asymptotically orthogonal 
to the algebra generated by dual functions, and thus should also be orthogonal to the 
cr-algebra generated by dual functions. Indeed, we can make this precise as follows. 
Given any dual function T>k-i(f) and any cutoff e > 0, we can generate a cr-algebra 
B £ (T>k-i(f)), by partitioning the complex plane C into squares of length s, and using 
the inverse images of these squares under T>k-i(f) as the atoms of the cr-algebra. There 
is some choice in how to choose this partition; a random translation of the standard 
partition will work here. A key result in is then that for any m ^ 1 and any 
functions fi, ■ ■ ■ , f m bounded in magnitude by v\, + 1, we have the uniform distribution 
property 

(1) 

(6.8) 

except on an exceptional set Q which is small in the sense that 

E((^ + l)l n ) (1) + (1) + ON^oo;m,k,s,N{ t ) ■ 

This claim can be derived fairly quickly from ()6.7)1 and the Weierstrass approximation 
theorem 25 ; see j2H|- 

We can now sketch the proof of Theorem 16.21 As mentioned earlier, the idea is to de- 
tect any obstructions to uniformity in fu (in the guise of dual functions Vk-i(fi), . . . , Vk-i(f m ), 
where fx, . . . , f m are bounded in magnitude by i/& + 1) and transfer them to f v ± one at a 
time. Oversimplifying somewhat (in particular, glossing over the role of the exceptional 
set Q), the algorithm for doing so is as follows: 

• Step 0. Set m = 0. 

25 As our functions here are complex valued, we have to consider polynomials which involve the 
conjugates of the dual functions T>k-i(fj) as well as the dual functions themselves, but this does not 
cause any additional difficulty 
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• Step 1. Set fu± := E(/|S e (2Vi(/i)) V . . . VB e (Z> fc _i(/m)) (so initially we would 
just have /jy_L = and then set fu'—f~ fu ± - Clearly is non-negative 
and has the same mean as /; from ()6.8|) we ensure that fu± is bounded. 

• Step 2. If fu is Gowers uniform, in the sense that \\fu\\u k - 1 (z/NZ) \ then 
we are done. Otherwise, we set f m +i '■= fu, increment m by 1, and return to 
Step 1. 

It turns out that every time we return from Step 2 to Step 1, the energy E(|/jy±| 2 ) 
of fu± increases by at least c £tk (plus some negligible o(l) errors), where c £)k > is an 
explicit positive quantity depending only on e and k; see [26J. Intuitively, the reason for 
this is as follows. If fu is not Gowers uniform, then by (jfi.fij) fu as a large correlation 
with T>f.-i(fu) = ^fe-i(/m+i)- But fu, by construction, is orthogonal to all the functions 
which are measurable with respect to the a-algebra B £ (V k ^i(f\)) V ... V B e (Vk-i(fm)i 
while T> k _i(f m+ i) lies (modulo negligible errors) in the larger a-algebra B £ {V k _i(fi)) V 
. . . V B £ (V k _i(f m+ i). The energy increment then follows (morally, at least) from the 
following simple lemma: 

Lemma 6.4 (Correlation implies energy increment). Let B C B' be a -algebras, and 
let f, g be functions such that f is orthogonal to all B -measurable functions, while g is 
B ' -measurable and bounded in magnitude by 1. Then we have the energy increment 

E(\E(f\B r )\ 2 )>Hmf\13)\ 2 ) + \(f,g)\ 2 - 
Proof. From the £>'-measurability of g we have 

{f,g) = {E(f\B'),g). 

Also, since / is orthogonal to all immeasurable functions, we have E(f\B) = 0. Thus 

(f,g) = (E(f\B')-E(f\B),g). 

Applying Cauchy-Schwarz and the boundedness of g we conclude 

E{\E{f\B')-E{f\B)\ 2 )>\(f,g)\ 2 

and the claim then follows from Pythagoras' theorem. □ 
In practice, we cannot quite use this simple lemma because of the presence of the 
exceptional sets Q, but it is still possible to obtain the energy increment by carefully 
modifying the above argument; see [2*o] . 

Observe that the energy E(|/[/x| 2 ) increments by a fixed factor at each stage of the 
iteration, but remains bounded independently of the number of steps of the iteration 
(ignoring some negligible o(l) type errors). Thus the algorithm can only run for a 
bounded number of steps, which keeps all the o(l) errors under control. After doing 
all the book-keeping, one eventually arrives at a proof of Theorem Id2| see [2E] for the 
full details. As discussed earlier, this is enough to establish that the primes contain 
arbitrarily long arithmetic progressions; the same argument also shows that any subset 
of the primes of positive relative density contain arbitrarily long arithmetic progressions. 
One can also follow through the argument carefully to eventually yield a lower bound 

E(A(a) ...A(a+{k- l)r)|l < a, r < N) ^ c{k) - o^oo;fc(l) 

for some explicitly computable c(k) > 0; the exact value is rather poor, depending on 
both the quantitative error bounds in the correlation estimates ()3.3|) . ([3.4)1 . as well the 
constant in Theorem 16.11 
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7. Further directions 

The transference methods here should be applicable to some other situations. For 
instance, a variant of the above argument was used recently in [IHj to show that the 
Gaussian primes in Z[i] contain infinitely many constellations of any prescribed shape 
and orientation; one needs to replace Szemeredi's theorem by the somewhat stronger 
"hypergraph removal lemma" of Gowers [221 anc ^ Rodl-Skokan [HH], jUI] (see also 
and the presence of the conjugation operation z i— > z in the Galois group Gal(Q[i]/Q) 
causes some technical difficulties, but otherwise the strategy is almost identical. We 
refer the reader to 47] and [IB] for further details. Similar results should also hold for 
other number fields that enjoy unique factorization. For instance, one should be able 
to show that given any finite field F, the monic irreducible polynomials of one variable 
in F[x] should contain affine subspaces over F of arbitrarily high dimension. 

A more challenging extension would be to obtain a multidimensional relative Sze- 
meredi theorem, which would assert that given any dimension d ^ 1, and given the set 
of primes P = {2,3,5, . . .}, that any subset of P d of positive relative density should 
contain infinitely many constellations of any prescribed shape and orientation. For 
P d replaced by M d , this result was proven in and also follows from the hyper- 
graph removal lemma mentioned briefly earlier. A major new difficulty here is that 
the natural enveloping sieve for P d is not very pseudorandom, even after applying the 
higher-dimensional analogue of the VT-trick; the lack of pseudorandomness, even for P 2 , 
can be seen by the observation that if the two acute corners of a right-angled triangle 
(with sides parallel to the axes) lie in P 2 , then the third corner also automatically lies 
in P 2 , despite P 2 being quite sparse. We do not know how to resolve this problem. 

It should also be possible to establish arbitrarily long progressions a, a+r, . . . , a + (k — 
l)r in the primes (or any positive relative density subset thereof), in which the spacing 
r is significantly smaller than the base point a, obtaining for instance progressions such 
that r = £i fc(a e ) for any given e. This is likely to follow by localizing the above theory 
to intervals of length 0(N £ ) in {N + 1, . . . , 2N}. 

A more difficult result would be to obtain a polynomial Szemeredi theorem for the 
primes. More precisely, if P\, . . . ,Pk '■ Z — > Z were any polynomials mapping the 
integers to the integers with -Pi(O) = . . . = Pfc(0) = 0, then there should be infinitely 
many fc-tuplets a + Pi(r), . . . , a + Pfc(r) with r ^ 0, such that all the a + Pj(r) are 
prime. If the primes were replaced by a positive density subset of Z, then this result 
was obtained by Bergelson and Leibman j^j. If one wished to localize this problem to 
Z/iVZ, it would be necessary to restrict r to be at most a small power of N, and so one 
may first have to understand the previous problem concerning progressions with small 
spacing before tackling this problem. The hypothesis P\(0) = ... = -Pfc(O) = seems 
to unfortunately be rather crucial to the method (for instance, one can easily construct 
counterexamples to the Bergelson-Leibman theorem without this hypothesis), which is 
a pity as one would otherwise have a route to prove such conjectures as the twin primes 
conjecture or more generally the Hardy-Littlewood prime tuple conjecture. 

Another problem (communicated by Vitaly Bergelson) which might now be feasible 
is to establish that the set P — 1 = {1, 2, 4, 6, 10, . . .} formed by decrementing one from 
each prime, is an IP set, or more precisely given any k there exist distinct ai, . . . , a& 
such that the finite sums a j '■ J — {!> • • • > J 7^ are contained in P — 1. The 

case k = 2 can be handled by the circle method, but the higher k remain open. Such a 
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result would then lead to a number of combinatorial consequences, see for instance [7] 
for further discussion. 
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